AITopics | initialization strategy

Training Your Image Restoration Network Better with Random Weight Network as Optimization Function

Neural Information Processing SystemsApr-24-2026, 06:51:12 GMT

The blooming progress made in deep learning-based image restoration has been largely attributed to the availability of high-quality, large-scale datasets and advanced network structures. However, optimization functions such as L1 and L2 are still de facto. In this study, we propose to investigate new optimization functions to improve image restoration performance. Our key insight is that "random weight network can be acted as a constraint for training better image restoration networks". However, not all random weight networks are suitable as constraints.

artificial intelligence, machine learning, random weight network, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

IQP Born Machines under Data-dependent and Agnostic Initialization Strategies

Lerch, Sacha, Bowles, Joseph, Puig, Ricard, Armengol, Erik, Holmes, Zoë, Thanasilp, Supanut

arXiv.org Machine LearningMar-17-2026

Quantum circuit Born machines based on instantaneous quantum polynomial-time (IQP) circuits are natural candidates for quantum generative modeling, both because of their probabilistic structure and because IQP sampling is provably classically hard in certain regimes. Recent proposals focus on training IQP-QCBMs using Maximum Mean Discrepancy (MMD) losses built from low-body Pauli-$Z$ correlators, but the effect of initialization on the resulting optimization landscape remains poorly understood. In this work, we address this by first proving that the MMD loss landscape suffers from barren plateaus for random full-angle-range initializations of IQP circuits. We then establish lower bounds on the loss variance for identity and an unbiased data-agnostic initialization. We then additionally consider a data-dependent initialization that is better aligned with the target distribution and, under suitable assumptions, yields provable gradients and generally converges quicker to a good minimum (as indicated by our training of circuits with 150 qubits on genomic data). Finally, as a by-product, the developed variance lower bound framework is applicable to a general class of non-linear losses, offering a broader toolset for analyzing warm-starts in quantum machine learning.

artificial intelligence, initialization, machine learning, (18 more...)

arXiv.org Machine Learning

2603.14576

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Government (0.45)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

043f0503c4f652c737add3690aa5d12c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 21:53:48 GMT

computer vision and pattern recognition, optimization function, random weight network, (10 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

Devansh Arpit, Víctor Campos, Yoshua Bengio

Neural Information Processing SystemsFeb-14-2026, 19:36:49 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, initialization, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

f52db9f7c0ae7017ee41f63c2a7353bc-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 03:10:59 GMT

feedforward dim 512, representation dim 256, transformer sublayer, (12 more...)

Neural Information Processing Systems

Genre: Overview (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

419f72cbd568ad62183f8132a3605a2a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 23:35:24 GMT

acquisition function, implementation, optimization, (14 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
(2 more...)

Add feedback

If You Want to Be Robust, Be Wary of Initialization

Neural Information Processing SystemsFeb-10-2026, 01:35:23 GMT

We introduce a theoretical framework bridging the connection between initialization strategies and a network's resilience to adversarial perturbations.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Italy > Sardinia (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Africa > Middle East > Morocco (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Information Technology > Security & Privacy (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

My document

Neural Information Processing SystemsFeb-9-2026, 21:46:27 GMT

In this paper, we present DiffSketcher, an innovative algorithm that creates vectorized free-hand sketches using natural language input.

machine learning, natural language, sketch, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > China > Hong Kong (0.04)
Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Subquadratic Overparameterization for Shallow Neural Networks

Neural Information Processing SystemsDec-24-2025, 04:36:18 GMT

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence using various initialization strategies, training modifications, and width scalings. In particular, the state-of-the-art results require the width to scale quadratically with the number of training data under standard initialization strategies used in practice for best generalization performance. In contrast, the most recent results obtain linear scaling either with requiring initializations that lead to the lazy-training, or training only a single layer. In this work, we provide an analytical framework that allows us to adopt standard initialization strategies, possibly avoid lazy training, and train all layers simultaneously in basic shallow neural networks while attaining a desirable subquadratic scaling on the network width. We achieve the desiderata via Polyak-Lojasiewicz condition, smoothness, and standard assumptions on data, and use tools from random matrix theory.

initialization strategy, name change, subquadratic overparameterization, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

A comparison between initialization strategies for the infinite hidden Markov model

Cortese, Federico P., Rossini, Luca

arXiv.org Machine LearningDec-4-2025

Infinite hidden Markov models provide a flexible framework for modelling time series with structural changes and complex dynamics, without requiring the number of latent states to be specified in advance. This flexibility is achieved through the hierarchical Dirichlet process prior, while efficient Bayesian inference is enabled by the beam sampler, which combines dynamic programming with slice sampling to truncate the infinite state space adaptively. Despite extensive methodological developments, the role of initialization in this framework has received limited attention. This study addresses this gap by systematically evaluating initialization strategies commonly used for finite hidden Markov models and assessing their suitability in the infinite setting. Results from both simulated and real datasets show that distance-based clustering initializations consistently outperform model-based and uniform alternatives, the latter being the most widely adopted in the existing literature.

initialization, initialization method, initialization strategy, (15 more...)

arXiv.org Machine Learning

2512.03777

Country: